Obj2Text: Generating Visually Descriptive Language from Object Layouts
نویسندگان
چکیده
Generating captions for images is a task that has recently received considerable attention. In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. We propose OBJ2TEXT, a sequence-tosequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using an LSTM language model. We show that our model, despite encoding object layouts as a sequence, can represent spatial relationships between objects, and generate descriptions that are globally coherent and semantically relevant. We test our approach in a task of object-layout captioning by using only object annotations as inputs. We additionally show that our model, combined with a state-of-the-art object detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score) in the test benchmark of the standard MS-COCO Captioning task.
منابع مشابه
Baby Talk: Understanding and Generating Image Descriptions
We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploi...
متن کاملSynthesis of Indexing Expressions for Complex Data Layouts
We present a technique for generating and optimizing expressions for indexing complex array layouts. Our technique is built around a declarative, domain-specific layout language that provides support for arbitrarily-nested row-major, column-major, Z-Morton, and Hilbert curve layouts. To maintain programmability, we maintain a ‘logical,’ two-dimensional view of the data in physical memory, allow...
متن کاملUML Modeling for Visually-Impaired Persons
Software modeling is generally a collaborative activity and typically involves graphical diagrams. The Unified Modeling Language (UML) is the de facto standard for modeling object-oriented software. It provides notations for modeling a system’s structural information (e.g. databases, sensors, controllers, etc.), and behavior, depicting the functionality of the software. Because UML relies heavi...
متن کاملAVDT - Automatic Visualization of Descriptive Texts
Expressing mental images visually as 3D scenes is a time-consuming challenge. Therefore, we employ natural language to facilitate the creation of virtual environments. In this paper, we present a framework, which automatically converts an arbitrary descriptive text into a representative 3D scene. Our system parses a user-written input text, extracts information using techniques from Natural Lan...
متن کاملA Graphical Data Model For CASE
~omputer-aided software engineering (CASE) applications mvolve several special data modeling requirements, not the least of which is the need to store graphical representations of complex descriptive data. This paper describes the EARNG data model, which is designed to serve as the basis for storage of such data in the context of CASE support tools. The model enables integrated storage of graph...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017